Taxonogenomics of Culturomica massiliensis gen. nov., sp. nov., and Emergencia timonensis gen. nov., sp. nov. new bacteria isolated from human stool microbiota

Two new bacterial strains, Marseille-P2698T (CSUR P2698 = DSM 103,121) and Marseille-P2260T (CSUR P2260 = DSM 101,844 = SN18), were isolated from human stools by the culturomic method. We used the taxonogenomic approach to fully describe these two new bacterial strains. The Marseille-P2698T strain was a Gram-negative, motile, non-spore-forming, rod-shaped bacterium. The Marseille-P2260T strain was a Gram-positive, motile, spore-forming rod-shaped bacterium. Major fatty acids found in Marseille-P2698T were C15:0 iso (63%), C15:0 anteiso (11%), and C17:0 3-OH iso (8%). Those found in Marseille-P2260T strain were C16:00 (39%), C18:1n9 (16%) and C18:1n7 (14%). Strains Marseille-P2698T and Marseille-P2260T had 16S rRNA gene sequence similarities of 91.50% with Odoribacter laneusT, and of 90.98% and 95.07% with Odoribacter splanchnicusT and Eubacterium sulciT, respectively. The exhibited digital DNA-DNA Hybridization values lower than 20.7%, and Orthologous Average Nucleotide Identity values lower than 73% compared to their closest related bacterial species O. splanchnicusT and E. sulciT respectively. Phenotypic, biochemical, phylogenetic, and genomic results obtained by comparative analyses provided sufficient evidence that both of the two studied strains Marseille-P2698T and Marseille-P2260T are two new bacterial species and new bacterial genera for which the names Culturomica massiliensis gen. nov., sp. nov., and Emergencia timonensis gen. nov., sp. nov. were proposed, respectively.

The human gut microbiota is considered currently as one of the most active research fields in microbiology 1 . In fact, this microflora harbours a huge biodiversity of bacteria of which a large part is still unknow 2 . Researchers used a variety of strategies to speed up and simplify the description of new bacterial species by optimizing their in vitro growth conditions [3][4][5] . Culturomics is one among these strategies, which relies on a diversification of culture conditions that allowed the identification of several new bacterial species isolated from the human gastrointestinal tract [3][4][5] . Since 2012, it allowed the isolation of over 1000 different human-associated bacterial species, including several hundreds of new species [3][4][5][6] . This method highlighted the need to adopt taxonomic approaches to clinical microbiology by including the use of modern and reproducible tools, such as high throughput genomic and proteomic analyses.
In November 30, 2015, two putative new bacterial species, Culturomica massiliensis gen. nov., sp. nov., and Emergencia timonensis gen. nov., sp. nov., were isolated from patient's stools, and partially described 7,8 . The genomic sequencing of new described bacterial species constitutes currently a necessary step for performing their comparative taxogenomic descriptions with their closest related known species. In fact, several recent publications have used genomic descriptions to characterize the new species by comparison to their closest relatives strains [9][10][11] . The aim of our current study was to complete the phenotypic, taxonomic and genomic characterization proposal of new genera and new species of Culturomica massiliensis gen. nov., sp. nov., strain www.nature.com/scientificreports/ Marseille-P2698 T , and Emergencia timonensis gen. nov., sp. nov., strain Marseille-P2260 T and formally expose the creation of both species.

Materials and methods
Sample collection and ethics approval. In November 30, 2015, stool samples were collected from hospitalized patients in the Timone Hospital (Marseille, France) as a part of a study of human microbiota diversity. Patients provided signed informed consent 7,8 . The study protocol was approved by the ethics committee of the institut de recherche fédératif 48, under agreement number 09-022. In addition, all methods were performed in accordance with the relevant guidelines and regulations. Each sample was then cultured according to the culturomics method previously established in our laboratory 3,5 . Various types of bacterial colonies were isolated on 5% of sheep blood-enriched Columbia agar (bioMérieux®, Marcy l'Etoile, France). Bacterial colonies were then screened for identification by Matrix-Assisted Laser Desorption Ionization-Time Of Flight Mass Spectrometry (MALDI-TOF MS) instrument (Bruker Daltonics®, Bremen, Germany) as previously reported 12 . Both two strains studied herein had a MALDI-TOF score lower than 2.0, which did not allow their correct identification. Their spectra were then added to the local MALDI-TOF MS database (https:// www. medit erran ee-infec tion. com/ urms-data-base).
16S rRNA gene sequencing and identification. The 16S rRNA sequences from both strains were directly extracted from their whole genomes sequences and then, compared by Basic Local Alignment Search Tool nucleotide (BLASTn) to the non-redundant (nr) databases 13 . The obtained sequence similarity percentages allowed identification of the closest species to each strain, and to predict if it was new species (< 98.65% of similarity). Then, the phylogenetic tree was constructed based on these 16S rRNA gene sequences in comparison to the closest related species of each studied strain. Designated species sequences were downloaded from nr 14 , and aligned with ClustalW. Phylogenetic trees were constructed using MEGA 11 version 11.0.10 with the maximum likelihood method and 1000 bootstrap replications 15 .
The morphology and motility were observed using a new-generation scanning electron microscope (Hitachi High-71 Technologies Corporation, Tokyo, Japan). Furthermore, three semi-quantitative standardized micro-methods of Analytical Profile Index (API®, bioMé-rieux®) tests: API® 20A, API® 50 CH, and API® ZYM were used, according to the manufacturer's instructions 16 , in order to study carbohydrate metabolism and enzymatic activities.
Fatty acid methyl ester (FAME) analysis was explored by Gas Chromatography/Mass Spectrometry, as previously reported 17,18 . FAMEs were separated using an Elite 5-MS column and monitored by mass spectrometry (Clarus 500-SQ 8 S, Perkin Elmer®, Courtaboeuf, France). Obtained spectra were compared with those contained in the repertory databases using MS Search 2.0 operated with the Standard Reference Database 1A (National Institute of Standards and Technology-NIST, Gaithersburg, USA), and FAMEs mass spectral database (Wiley, Chichester, UK).
Whole genomic sequencing and bioinformatic analyses. First, bacterial DNA was extracted using the EZ1 DNeasy Blood Tissue Kit (Qiagen® GmbH, Hilden, Germany) in line with the manufacturer's protocol 19 . Whole-genome sequencing was performed using an Illumina® MiSeq sequencer (Illumina®, San Diego, CA, USA) 20 . Then, sequenced genomes were assembled using SPAdes 3.5.0 software 21 , which reduces short indels and the huge number of mismatches. Raw reads in contigs less than 700-bp-long were removed. Finally, the quality of the sequenced genome was checked using BLAST against the nr/nt database. This method allowed us to better explore the relationship between a submitted assembly of our new species to the International Nucleotide Sequence Database Collaboration (INSDC), i.e., DDBJ, ENA, or GenBank, and the assembly represented in the NCBI reference sequence (RefSeq) project. The global statistics section reported general statistics information including Gaps between scaffolds, number of scaffolds, number of contigs, total sequence length, and total ungapped length. Furthermore, taxonomic data were checked according to the best-matching-type strain with the declared new species repertory in NCBI 22 .
During annotation, genomic parameters were evaluated including transfer-messenger RNAs (tmRNAs) and transfer RNAs (tRNAs) using ARAGORN version 1.2 and ribosomal RNAs (rRNAs) using Barrnap version 0.9 23,24 . Generated file (.faa) was used for BLAST-P analyses against the Clusters of Orthologous Genes (COGs) database, and used for CRISPR-Cas identification 25 . Resistance genes were screened using ResFinder 26 . Other bioinformatic tools were also used such as AntiSMASH to search polyketide synthases (PKS) and non-ribosomal peptide synthetases (NRPS) 27 . Circular maps of the two genomes were generated using CGView (Circular Genome Viewer) software. This Java application converts XML or tab-delimited input into a Vector Graphics format 28 .
Besides, phylogenetic trees of interest were generated with the FastME 2.1.6.1 software to highlight the position of each new bacterial strain among its closest relatives 29 . Digital DNA-DNA Hybridization (dDDH) values were calculated to check the difference between the genomes using the following website (https:// ggdc. dsmz. de). Critical limit was set at 70% below which a prokaryotic species may be considered as new 30 . Orthologous average nucleotide identity (OrthoANI) version 0.93.1 was also used to calculate genomic similarities between studied species and their related taxa.

Results
Strain identification and phylogenetic analyses. The species names Culturomica massiliensis gen.
nov., sp. nov., and Emergencia timonensis gen. nov., sp. nov., had been previously proposed for the new species mainly as representative strains Marseille-P2698 T and Marseille-P2260 T , respectively 7,8 . As these previous descriptions were not exhaustive, we revisited the work by including phylogenetic, morphological, and genomic data. Strain Marseille-P2698 T had a 16S rRNA gene sequence similarity and a query coverage of 91.5% and 99% respectively with Odoribacter laneus strain YIT 12061 T (Fig. 1). Strain Marseille-P2260 T exhibited 16S rRNA gene similarity and a query coverage of 92.72%, and 100% respectively, with Eubacterium sulci strain ATCC 35585 T (Fig. 1).
Phenotypic and biochemical characterizations. Growth of strains Marseille-P2698 T and Marseille-P2260 T occurred on 5% sheep blood-enriched Columbia agar (bioMérieux®), after 48 h of incubation at 37 °C in a strict anaerobic atmosphere. Optimal growth was obtained at pH 7. However, strain Marseille-P2260 T did not tolerate NaCl, whereas strain Marseille-P2698 T could grow with a NaCl concentration of 0.5%.
Strain Marseille-P2698 T is a Gram-negative rod, strictly anaerobic, motile, and non-spore-forming with a size of 1.5-3 μm in length and 0.3 to 0.4 μm in diameter. It exhibits a positive catalase, but no oxidase activity. The colonies are circular, beige and from 0.7 to 1.2 mm in diameter.
Genomic properties and analyses. The whole genome of strain Marseille-P2698 T was composed of 14 contigs, for a total size of 4,410,591 bp, with a G+C content of 43 mol% (Fig. 2). This genome contained 3679 genes, of which 3487 were protein-coding genes. In addition, 59 RNA sequences were also identified and distributed as follows: 8 rRNAs (three 16S, two 23S, and three 5S), 51 tRNAs, and 1 tmRNA.
Genome from strain Marseille-P2260 T was composed of 9 contigs, with a size of 4,661,482 bp, and a 45.8 mol% G + C content (Fig. 2). Genome annotation identified 4380 genes, of which 4288 were protein-coding genes. There were 56 RNA sequences including 5 rRNAs (one 16S, one 23S, and three 5S), 51 tRNAs, and 1 tmRNA.
Odoribacter laneus T , O. splanchnicus T , Butyricimonas synergistica T , B. faecihominis T and B. virosa T exhibited genome sizes ranging from 3.77 to 4.81 Mbp. The closest bacteria of strains Marseille-P2698 T and Marseille-P2260 T are presented in Table 3 Table 4).
The phylogenetic relationships of strains Marseille-P2698 T and Marseille-P2260 T with relative strains, based on whole-genome sequencing, is represented in Fig. 3.  www.nature.com/scientificreports/    T . These values were lower than 95%, also suggesting that strains Marseille-P2698 T and Marseille-P2260 T belonged to distinct species (Fig. 4).
Genes encoding proteins are divided into several categories according to their functions and others have unknown functions. COGs of strains Marseille-P2698 T and Marseille-P2260 T have different functional distributions from their closest species (Fig. 5).
The accession numbers for the genomic and 16S rRNA gene sequences of strain Marseille-P2698 T are deposited in the GenBank database under references FLSN00000000 and LT558805, respectively.
Description of Emergencia gen. nov. Emergencia timonensis (e.mer.gen' cia N.L. fem. n., Emergencia, for emergence, in reference to the discovery of emerging human bacteria).
The genome size from strain Marseille-P2260 T is 4.66 Mbp with a G+C content of 45.8 mol%.
The type strain Marseille-P2260 T (CSUR P2260 = DSM 101,844 = SN18) was isolated from a stool sample the feces of a healthy patient with an unremarkable medical history.

Data availability
The accession numbers for the genomic and 16S rRNA gene sequences are deposited in the GenBank database under references: FLSN00000000 and LT558805 respectively for Marseille-P2698 T strain, and FLKM00000000 and LN998061 respectively for Marseille-P2260 T strain.